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-Abstract- 

A rectifier network is a directed acyclic graph with distinguished sources and sinks; it is said to 
compute a Boolean matrix M that has a 1 in the entry (i,j) iff there is a path from the jth 
source to the ith sink. The smallest number of edges in a rectifier network that computes M is 
a classic complexity measure on matrices, which has been studied for more than half a century. 

We explore two well-known techniques that have hitherto found little to no applications in 
this theory. Both of them build upon a basic fact that depth-2 rectifier networks are essentially 
weighted coverings of Boolean matrices with rectangles. We obtain new results by using fractional 
and greedy coverings (defined in the standard way). 

First, we show that all fractional coverings of the so-called full triangular matrix have cost 
at least nlogn. This provides (a fortiori) a new proof of the tight lower bound on its depth-2 
complexity (the exact value has been known since 1965, but previous proofs are based on different 
arguments). Second, we show that the greedy heuristic is instrumental in tightening the upper 
bound on the depth-2 complexity of the Kneser-Sierpihski (disjointness) matrix. The previous 
upper bound is 0(n^-^®), and we improve it to 0(n^-^^), while the best known lower bound is 
n(n^-^®). Third, using fractional coverings, we obtain a form of direct product theorem that 
gives a lower bound on unbounded-depth complexity of Kronecker (tensor) products of matrices. 
In this case, the greedy heuristic shows (by an argument due to Lovasz) that our result is only 
a logarithmic factor away from the “full” direct product theorem. Our second and third results 
constitute progress on open problem 7.3 and resolve, up to a logarithmic factor, open problem 7.5 
from a recent book by Jukna and Sergeev (in Foundations and Trends in Theoretical Computer 
Science (2013)). 

Digital Object Identifier 10.4230/LlPlcs.xxx.yyy.p 

[T] Introduction 

Introduced in the 1950s, rectifier networks are one of the oldest and most basic models in the 
theory of computing. They are directed acyclic graphs with distinguished input and output 
nodes; a rectifier network is said to compute (or express) the Boolean matrix M that has 
a 1 in the entry {i,j) iff there is a path from the jth input to the fth output. Equivalently, 
rectifier networks can be viewed as Boolean circuits that consist entirely of OR gates of 
arbitrary fan-in. This simple model of computation has attracted a lot of attention m, 
because it captures the “topological” core of other models: complexity bounds for rectifier 
networks extend in one way or another to Boolean circuits (i.e., circuits with Boolean gates) 
and to switching circuits [311127]. 

Given a matrix M, what is the smallest number of edges in a rectifier network that 
computes Ml Denote this number by OR(M) —this is a complexity measure on Boolean 
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matrices. This measure is fairly well understood: we know, from Nechiporuk |30j . that the 
maximum of OR(M) grows as n^/21ogn as n —> oo if M is n x n; we also know that random 
n X n-matrices have complexity very close to n?/2\ogn. The “shape” of these two facts is 
reminiscent of the standard circuit complexity of Boolean functions over AND, OR, and 
NOT gates—but for them, the maximum is 2"/n instead of n^/21ogn. 

However, much more is known about the measure OR(-): there are explicit sequences 
of matrices that have complexity close to the maximum (in contrast, for circuits 

over AND, OR, and NOT gates, exhibiting a single sequence of functions that require a 
super linear number of gates would be a tremendous breakthrough). In fact, nowadays a 
range of methods are available for obtaining upper and lower bounds on OR(M) for specific 
matrices M; we refer the interested reader to the recent book by Jukna and Sergeev m- 

Many natural questions, however, remain open. Jukna and Sergeev list 19 open problems 
about OR(-) and related complexity measures. Several of them refer to very restricted 
submodels, such as rectifier networks of depth 2: that is, networks where all paths contain 
(at most) 2 edges. A depth-2 rectifier network expressing a matrix M is essentially a covering 
of M —a collection of (rectangular) all-1 submatrices of M whose disjunction is M. In our 
work, we look into the corresponding complexity measure 0R2(-) as well as OR(-). We build 
upon the connection between rectifier networks and (weighted) set coverings and explore 
two well-known ideas that have previously found few applications in the study of rectifier 
networks: they are associated with fractional and greedy coverings respectively. 

Fractional coverings are a generalization of usual set coverings. In the usual set cover 
problem, each set S can be either included or not included in the solution (i.e., in the 
covering); in the fractional version each set can be partially included: a solution assigns 
to each set S a real number xs S [0; 1], and for every element s of the universe the sum 
X^sgS should be equal to or exceed 1. In other words, fractional coverings arise from linear 
relaxation of the integer program that expresses the set cover problem. Greedy coverings are, 
in contrast, usual coverings; they are the outcome of applying the standard greedy heuristic 
to an instance of the set cover problem: at each step, the algorithm picks a set S that covers 
the largest number of yet uncovered elements s. In our work, we use fractional and greedy 
coverings to obtain estimates on the values of OR 2 (M) and OR(M). 

Our results 

First, we demonstrate that OR 2 (T'„) = n([log 2 nj -f 2) — 2 L'°S 2 where T„ is the so-called 
full triangular matrix: an upper-triangular matrix that has Is everywhere above the main 
diagonal and Os on the diagonal and below. In this problem, the upper bound is easy and 
the challenge is to prove the lower bound. This was previously done by Krichevskii |20j . and 
our paper provides a different proof of independent interest. In fact, we prove a stronger 
statement: all fractional coverings of T„ have large associated cost (Theorem]^. To this end, 
we take the linear program that expresses the fractional set cover problem and find a good 
feasible solution to the dual program. The value of this solution then gives a lower bound on 
the cost of all feasible solutions to the primal—that is, on the cost of fractional coverings. 
Since integral coverings are just a special case of fractional coverings, the result follows. 

Second, we improve the upper bound on the value of OR 2 (D„), where is the disjointness 
matrix, also known as the Kneser-Sierpihski matrix. This constitutes progress on open 
problem 7.3 in Jukna and Sergeev’s book m, where the previously known bounds are 
obtained. The previous upper bound is 0(n^'^®), and our Theoremimproves it to 0(n^'^^), 
while the best known lower bound is H(n^'^®). To achieve this improvement, we subdivide 
the instance of the weighted set cover problem (in which the optimal value is OR 2 (iJ„)) into 
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polylog(n) natural subproblems and reduce them, by imposing an additional restriction, to 
instances of unweighted set cover problems. We then solve these instances with the greedy 
heuristic; the upper bound in the analysis invokes the so-called greedy covering lemma 
by Sapozhenko m, also known as the Lovasz-Stein theorem US EH]- This gives us the 
desired upper bound on OR 2 (Dti); in fact, the greedy strategy turns out to be optimal, and 
the optimal exponent in OR 2 (Z?ti) comes from a numerical optimization problem. As an 
intermediate result we determine, up to a polylogarithmic factor, the value of OR 2 {D^) 
where I?™ is the adjacency matrix of the Kneser graph on 2(^) vertices. 

Finally, we obtain (Theorem |13[) a form of direct product theorem for the OR(-) measure: 
OR{K (g) M) > rky{K) ■ OR(M). Here K ^ M denotes the Kronecker product of matrices K 
and M, and rk* (A) is a fractional analogue of the Boolean rank of K. This resolves, up to a 
logarithmic factor, open problem 7.5 in the list of Jukna and Sergeev m, which asks for the 
lower bound of rkv(A) • OR(M) where rkv(A) > rk* (A) is the Boolean rank of A. (In fact, 
a related question for unambiguous rectifier networks, or SUM-circuits, is originally due to 
Find et al. [B]; our technique applies to this model as well, giving an analogous inequality 
for the measure SUM(-), see Corollary 15 ) Suppose A is an m x n matrix; then, by the 
argument due to Lovasz m, the greedy heuristic shows that rk*(A) > rkv(A)/(l -|-logmn), 
so our lower bound is indeed at most a logarithmic factor away from the “full” direct product 
theorem. To prove our lower bound, we take the linear programming formulation of the 
fractional set cover problem for the matrix A and use components of the optimal solution 
to the dual program to guide our argument. It is interesting to see how reasoning about 
coverings, or, equivalently, about depth-2 rectifier networks, enables us to obtain meaningful 
lower bounds on the size of rectifier networks that have unbounded depth. 


[Y] Discussion and related work 

We use the matrix language in this paper, but all results can be restated in terms of biclique 
coverings of bipartite graphs. 

The OR 2 -complexity of full triangular matrices, T„, is tightly related to results 
on biclique coverings of complete undirected (non-bipartite) graphs from the early days of 
the theory of computing. The n log n lower bound, in one form or another, was known to 
Hansel HU], Krichevskii [UU], Katona and Szemeredi [TU], and Tarjan [3U][^ Apart from purely 
combinatorial considerations, the interest in this problem is motivated by its applications in 
formula and switching-circuit complexity of the Boolean threshold-2 function (which takes 
on the value 1 if and only if at least two of its inputs are set to 1). For more context, see 
treatments by Radhakrishnan [33] and Lozhkin m- Our lower bound is obtained in a slightly 
more restrictive setting, because of explicit asymmetry: for OR 2 (T„), one needs to cover 
entries (i, j) with i < j in the matrix; in biclique coverings of undirected graphs, it suffices 
to cover either of (i, j) and (j, i). Nevertheless, to the best of our knowledge, ours is the only 
proof that goes via linear programming (LP) duality and provides a tight lower bound on 
the size of fractional coverings. This result is new; we are not aware of other lower bounds 
for rectifier networks that come from feasible solutions to the LP dual (in approximation 
algorithms, a related technique is known under the name of “dual fitting ” mi Section 9.4]). 

As for the greedy heuristics, we are not the first to use them in the context of depth-2 
rectifier networks. Andreev [T] obtained a tight worst-case upper bound for a class of matrices 
potentially containing “wildcard” entries (*). This upper bound is in terms of the number of 


^ Not all of these arguments compute the exact value of OR2(rn). 
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occurrences of Os and Is, provided that these numbers satisfy certain conditions as the matrix 
size tends to infinity. Our Theorem however, does not follow from Andreev’s worst-case 
bound. The disjointness matrix, £)„, which we apply this technique to, is a well-studied object 
in communication complexity ED; it is a discrete version of the Sierpihski triangle. Boyar 
and Find [ 5 ] and Selezneva [ 35 ] proved that 0R(O„) = 0(nlogn) and SUM(£i„) = Inlogn^ 
In depth 2, the previous bounds are due to Jukna and Sergeev US]; it is unknown if greedy 
heuristics are also of use for SUM-circuits, as our upper bound for Dn does not extend to 
this model (our coverings are not partitions). 

Direct sum and direct product theorems in the theory of computing are statements 
of the following form: when faced with several instances of the same problem on different 
independent inputs, there is no better strategy than solving each instance independently]^ 
For rectifier networks, these questions are associated with the complexity of Kronecker 
(tensor) products of matrices. Indeed, denote the k x fc-identity matrix by Ik, then Ik® M 
is the block-diagonal matrix with k copies of M on the diagonal. It is not difficult to show 
that OR(/fe ® M) > k ■ OR(M), and a natural generalization asks whether OR(Ar 0 M) > 
rkv(A') •OR(M) for any matrix K —see Find et al. and Jukna and Sergeev na Sections 2.4, 
3.6, and open problem 7.5]. To date, this inequality is only known to hold in special cases. 
For example. Find et al. can show this lower bound when the matrix K has a fooling set 
of size rkv(Ar); however, the size of the largest fooling set does not approximate the Boolean 
rank, as observed, e.g., by Gruber and Holzer [3] (they use the graph-theoretic language, 
with bipartite dimension instead of rky). As another example, denote by |M| the number of 
Is in the matrix M and assume that M has no all-1 submatrices of size (fc -b 1) x (Z -b 1). 
Then the inequality OR(M) > \M\/kl is a well-known lower bound due to Nechiporuk [3T| . 
subsequently rediscovered by Mehlhorn m. Pippenger ( 33 ], and Wegener [ 33 ]; Jukna and 
Sergeev Theorem 3.20] extend it to OR(A' (g) M) > rkv(Ar) • \M\/kl for any square 
matrix K. To the best of our knowledge, the current literature has no stronger lower bounds 
on the OR-complexity of Kronecker products; our Theorem |13| comes logarithmically close 
to the desired bound. For SUM-complexity, the state of the art and our contribution are 
analogous to the OR-case. The related notion of a fractional biclique cover has previously 
appeared, e.g., in the papers of Watts [33] and Jukna and Kulikov m- 

Also related to our work is the study of the size of smallest biclique coverings, under the 
name of the bipartite dimension of a graph (as opposed to the cost of such coverings and 
the OR 2 -complexity; see Section]^. This quantity corresponds to the Boolean rank of a 
matrix and is known to be PSPACE-hard to compute [5] and NP-hard to approximate to 
within a factor of [3]. Finally, we note that results on OR 2 -complexity have corollaries 
for descriptional complexity of regular languages. Indeed, take a language where all 
words have length two, L C S • A, with S = {ai,..., am} and A = {oi,..., a„}. Let 
be its characteristic m x n matrix: MA = 1 iff • aj G L. Then OR 2 (M'^) coincides with 
the alphabetic length of the shortest regular expression for L; for example, it follows from 
Corollary that the optimal regular expression for the language = {aiUj ] 1 < * < j < n} 
has n( [log 2 nj -b 2) — 2 n\+i occurrences of letters (S = A = {ai,..., a„}). The values 
of OR(M^) and OR 2 (M^) are also related to the size of the smallest nondeterministic finite 
automata accepting L; see m and Appendix for details. 


^ Recall that the SUM(-) measure corresponds to unambiguous rectifier networks, in which every input- 
output pair is connected by at most one path; or, equivalently, to arithmetic circuits over nonnegative 
integers with addition (SUM) gates. For any matrix M, OR(M) < SUM(M) and OR2(A^) < SUM2(Af). 

^ In some contexts, the terms “direct sum theorem” and “direct product theorem” have slightly different 
meanings m, but in the current context we do not distinguish between them. 
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(c) Rectifier network of depth 2 

H Figure 1 Illustrations for Example 

["3"! Rectifier networks and coverings 
Rectifier networks 

Define a rectifier network with m inputs and n outputs as a 4-tuple Af = (14, i?, in,out), 
where 14 is a set of vertices, E CV'^ a. set of edges such that the directed graph G_\f = (V,E) 
is acyclic, and in : {1, ... ,n} —>■ 14 and out: {1,...,m} —)■ V are injective functions whose 
images contain only sources (and, respectively, only sinks) of G^. The network Af is said to 
have size \E\. 

A rectifier network Af expresses a Boolean mx n matrix M = M (Af) such that = 1 if 
Gji^ contains a directed path from in(j) to out(z) and = 0 otherwise. A rectifier network 
Af is said to have depth d if all maximal paths in G^ have exactly d edges. Given a Boolean 
matrix A G {0, let OR 2 (A) denote the smallest size of a depth-2 rectifier network that 

expresses A and let OR(A) denote the smallest size of any rectifier network that expresses A. 

This notation is justified by the following observation. A rectifier network Af may be 
viewed as a circuit: its Boolean inputs are located at the vertices in({l,..., n}), and gates at 
all other vertices compute the disjunction (Boolean OR) of their inputs. From this point of 
view, the circuit computes a linear operator over the monoid ({0,1}, OR), and the matrix of 
this linear operator is exactly the Boolean matrix expressed by the rectifier network Af. 


7 8 






(a) Rectifier network of depth 3 


111111 ^ 
/ 1 1 1 1 1 1 1 1 
11111111 
11111111 
00001111 
00001111 
Vooooiiii 
^ 00001111 ' 

(b) Matrix B 


► Example 1. A depth-3 rectifier network is shown in Figure[^ It expresses the matrix B in 
Figure lb showing that OR 3 (i?) < 19. In fact, this network is optimal and OR 3 (R) = 19; see 
Appendix for details. At the same time, OR 2 (i?) = 20: the upper bound is achieved by the 
network in Figure Ic and the lower bound is due to Jukna and Sergeev [151 Theorem 3.18]. 


Coverings of Boolean matrices 

Let us describe an alternative way of defining the function OR 2 (-)- Given a Boolean matrix A, 
a rectangle (or a 1-rectangle) is a pair (i?, C), where R C {1,..., m} and G C {1,..., n}, 
such that for all {i,j) G Rx C we have = 1. A rectangle {R,G) is said to cover all pairs 
(1, j) G R X G. The cost of a rectangle {R,G) is defined as |i?| -I- jCj. 

Suppose a matrix A is fixed; then a collection of rectangles is called a covering of A if for 
every {i,j) G {1, • ■ ■, rn} x {1,..., n} there exists a rectangle in the collection that covers 
(1, j). The cost of a collection is the sum of costs of all its rectangles. 

Given a Boolean matrix A G {0,1}™^”, the cost of A is defined as the smallest cost of a 
covering of A. It is not difficult to show that the cost of A equals OR 2 (A) as defined above. 

Similarly, we can think of minimizing the size of a covering, i.e., the number of rectangles 
in a collection instead of their total cost. The smallest size of a covering of A is called the 
OR-rank (or the Boolean rank) of A, denoted rkvA. 
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Y] w{S) xs —>■ min 

xs & {0,1} for all 5 G 

Y 2^5 > 1 for all u £ U 
SeJ^: 
u^S 

(a) Integer program 


Y w{S) Xs —>■ min 

0 < Xs < 1 for all S' G 

Y Xs > 1 for all u £ U 

SG^: 

uGS 

(b) Linear relaxation 


Y Vu^ max 
ueu 

j/tj > 0 for all u £ U 

Y Vu ^ w{S) for all S G 

aGS 

(c) Dual of the linear relaxation 


M Figure 2 Integer and linear programs for the set cover problem 


4 I Fractional and greedy coverings 

In the rest of the paper we interpret the covering problems for Boolean matrices as special 
cases of the general set cover problem. In this section we recall this general setting and present 
two main techniques that we apply: linear programming duality and greedy heuristics. 

An instance of the (weighted) set cover problem consists of a set U, a family of its subsets, 
J- C 2^, and a weight function, which is a mapping w. —>■ N. Every set S £ IF is said to 
cover all elements s £ S C U. The goal is to find a subfamily IF' C that is a covering (i.e., 
it covers all elements from U: S = U) and has the smallest possible total weight (i.e., 

it minimizes the functional YsgT' ^(‘^) amongst all coverings). In the unweighted version of 
the problem, w{S) = 1 for all S £ T, so the total weight of a covering is just its size (number 
of elements in IF'). In both versions, is usually assumed to be a feasible solution, which 
means that every s £ U belongs to at least one set from IF: that is, [jse^S = U. 

It is instructive, throughout this section, to have particular instances of the set cover 
problem in mind, namely those of covering Boolean matrices with rectangles as in Sectionj^ In 
the following sections, we refer to them as weighted and unweighted set covering formulations; 
their optimal solutions correspond to the values of OR 2 (A) and rkyA respectively. 


Fractional coverings 


The set cover problem can easily be recast as an integer program: see Figure For each 
S £ IF, this program has an integer variable xs £ {0,1}: the interpretation is that xs = 1 
if and only if S' G IF', and the constraints require that every element is covered. Feasible 
solutions are in a natural one-to-one correspondence with coverings of U, and the optimal 
value in the program is the smallest weight of a covering. 

The linear programming relaxation of this integer program is obtained by interpreting 
variables xs over reals: see Figure Now 0 < xs < 1 for each S £ F. Feasible solutions 
to this program are called fractional coverings. Suppose the optimal cost in the original 


set cover problem is r. Then the integer program in Figure 2a has optimal value r, and its 


relaxation in Figure 2b optimal value r* < r. 

Finally, define the dual of this linear program: this is also a linear program, and it has a 


(real) variable pu for each element u £ U; see Figure 2c This is a maximization problem, 
and its optimal value coincides with r* by the strong duality theorem. 

The following lemma summarizes the properties of these programs needed for the sequel. 


► Lemma 2. If (yu)uGU is a feasible solution to the dual, then YueuH^ <t. There 
exists a feasible solution to the dual, {yu)ueu, such that Yu^suDu = 

The proof can be found in, e.g., m- We use the first part of Lemma in Section to 
obtain a lower bound on r and the second part in Section]^ to associate “weights” with 
1-elements in the matrix. 
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Greedy coverings 

The greedy heuristic for the unweighted set cover problem works as follows. It maintains the 
set of uncovered elements, initially U, and iteratively adds to J^' (which is initially empty) a 
set S £ T which covers the largest number of yet-uncovered elements. Any covering obtained 
by this (nondeterministic) procedure is called a greedy eovering. (There is a natural extension 
to the weighted version as well.) 

A standard analysis of the greedy heuristic is performed in the framework of approximation 
algorithms: the size of a greedy covering is at most 0(log|?7|) times larger than that of 
the optimal covering [U [51]. But for our purposes a different upper bound will be more 
convenient: an “absolute” upper bound in terms of the “density” of the instance. Such a 
bound is given by the following result, which is substantially less well-known: 


► Lemma 3 (greedy covering lemma). Suppose every element s € U is contained in at least 
7 |J^| sets from T, where 0 < 7 < 1. Then the size of any greedy covering does not exceed 


iln+(7|C/|) 

7 


1 

1 

7 


where In’^(x) = max( 0 ,lna:) and In a: is the natural logarithm. 


Several versions of the lemma can be found in the literature. It was proved for the first 
time in 1972 by Sapozhenko [H] and appears in later textbooks [HI Lemma 9 in Section 3, 
pp. 136-137], mj pp. 134-135]. A slightly different form, attributed to Stein |35| and 
Lovasz |23j . was independently obtained later and is sometimes known as the Lovasz-Stein 
theorem; yet another proof is due to Karpinski and Zelikovsky |18j . Recent treatments with 
applications and more detailed discussion can be found in Deng et al. |S| and in Jukna’s 
textbook m pp. 34-37]. 

Since the upper bound of Lemma is hardly a standard tool in theoretical computer 
science as of now, a remark on the proof is in order. A standalone proof goes via the following 
fact: on each step of the greedy algorithm the number of yet-uncovered elements shrinks 
by a constant factor, determined by the density parameter 7 and the size of the instance. 
Alternatively, one can use the result due to Lovasz [23] that the size of any greedy covering is 
within a factor of 1 - 1 - log \ U\ from the optimal fractional covering. Since assigning the value 


leads to a feasible solution, an upper bound of (I/ 7 ) ■ (1 - 1 - log IC/]) follows. 

We use Lemma [^ in Section to obtain an upper bound on the 0 R 2 -complexity of 
Kneser-Sierpihski matrices. We remark that instead of greedy coverings one can use random 
coverings to essentially the same effect (cf. Deng et al. |S]). 


(min^gj/ [{S' G J-: s G S'}]) ^ = 1/7117] to all xs, S € T, in the linear program in Figure 2b 


I 5 I Lower bound for the full triangular matrices 

Define the n x n full triangular matrix Tn = (lij)o< 7 i<ra by Uj = 1 if z < j and tij = 0 
otherwise. This matrix Tn is the adjacency matrix of the Basse diagram of the strict linear 
order 0<l<---<n—l;it has Is everywhere above the main diagonal and Os on the 
diagonal and below. In this section, we study the smallest size of depth-2 rectifier networks 
that express r„. 

Define s{n) = n( [log 2 n\ 2 ) — 2 L*°S 2 "J-i-i for n > 1 . Note that s(n) is the so-called binary 
entropy function, sequence A003314 in Sloane’s Encyclopedia of Integer Sequences |37|. Its 
properties were studied previously by Morris m because of its connection with mergesort. 
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i\j 0123456789 10111213141516 
0 02101000100000001 
1 00210100010000000 
2 00021010001000000 

3 00002101000100000 

4 00000210100010000 

5 00000021010001000 

6 00000002101000100 

7 00000000210100010 

8 00000000021010001 

9 00000000002101000 

10 00000000000210100 
11 00000000000021010 
12 00000000000002101 

13 00000000000000210 

14 00000000000000021 

15 00000000000000002 

16 00000000000000000 

(a) Portion of the matrix M 


Columns C 


Rows R 




c = max R 


'1 

1 

1 e = 

nonzero 

nonzero 


0 


0 


nonzero 


0 


0 


(b) Definition of a, b, c, d, e 


M Figure 3 Illustrations for the 


proof of Theorem 


0 


► Theorem 4. All fractional coverings ofTn have cost of at least s{n). 

► Corollary 5. OR 2 (T'„) = s(n). 

Note that the equality of Corollary gives the exact value of OR 2 (T'n)- The upper bound 
is an easy divide-and-conquer argument (reproduced in Appendix for completeness), and the 
main challenge is to obtain the lower bound. 

Consider the weighted set covering formulation for T„, where the optimal value is OR 2 (T'„) 
as discussed in Section]^ By Lemmait suffices to find a feasible solution to the dual linear 
program with the value s{n). Our feasible solution is given by a certain infinite diagonal 
matrix M, with rows and columns indexed by the natural numbers, defined as follows: 


Mij 


f 2, if j - j = 1; 

< 1, if j — f = 2'J for some q> 1; 
I 0, otherwise. 


The first 17 rows and columns of M are displayed in Figure 3a 
shift, by 1, of the preceding row. 


Notice that each row is a 


► Lemma 6. The sum of the elements of the n x n upper left submatrix of M, is 

equal to s(n). 


► Lemma 7. yij = Mij for 0<i<j<nisa feasible solution to the dual program. 

Proof of Lemma |6l is obtained from by concatenating a row of O’s on the 

bottom, and a column that contains a single 2 and I’s corresponding to the powers of 2 that 
are < n. In other words, s(n + 1) = s{n) + [log 2 nj + 2. The result now follows by an easy 
induction. ◄ 


Proof of Lemma 0 To prove feasibility, we need to see that for each pair of nonempty 
sets R^C C {0, l,...,n — 1} with max R < min C —only such pairs (i?, C) are rectangles of 
T„—we have 

< |i?| + |q. (1) 

jec 

Here R corresponds to a choice of rows of M and (7 to a choice of columns. 
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Suppose there exists a counterexample to Q . Among all counterexamples to Q , consider 
one with the smallest possible value of |i?| + ICI. If |i?| = 1 then since at most one entry in 
each row is 2 and all others are either 0 or 1, we clearly have X) — l^l + IC"! = ICj + l. 

Hence \R\ > 2. The same argument applies if \C\ = 1. Thus the minimal counterexample to 
Q has at least two rows and columns. 

We now observe that the row sum of each row in our counterexample is at least 2. For 
if it is 0 or 1 we could omit that row, and Q would still be violated. The same argument 
applies to the column sums. We now prove 

► Claim. Suppose there are at least two nonzero elements in the submatrix of M formed by 
rows 0,1,..., 6 and column e of M. Then e < 2h. 

Proof. The nonzero elements in column e occur precisely in the rows numbered e — l,e — 
2,..., e — 2* where i is the largest integer with e — 2® > 0. So if there are nonzero elements in 
rows 0,1,..., 6 , these would be given by e — 2® and e — 2®“^. So e — 2®“^ < b. It now follows 
that e < & + 2®“^ = &+i-2®<&+ie (since e > 2®), and so e < 2b. This concludes the proof 
of the claim. ◄ 

Now let us assume that our minimal counterexample has c = max R. Let e = max C. 
Since column e has 2 nonzero elements, by the Claim above we know e < 2c. Now let b be 
the largest element < c in R for which there is a nonzero element in column e; this must 
exist since column e has at least two nonzero elements. Let a be any row < b in R with a 
nonzero element in column e. Again, this must exist since column e has at least two nonzero 
elements. Finally, let d be any column < e in C with a nonzero element in row a. This must 
exist because every row in R has at least two nonzero elements. We claim d < c. 

To see this, note that 5 = e — 2-^ < c for some j > 0. (In fact, j = |"log 2 (e — c)].) 
Then we must have a = e — 2^ > 0 where k > j + 1. Then d — a = 2^ for some i. So 
d — a = d — {e — 2^) = 2^ and hence d = e + 2^ — 2^. Since d < e we have £ < k. So 
d < e + 2^“^ — 2^ = e — 2^“^ < e — 2^ = b < c. This is illustrated in Figure 

Now maxi? < minC, but d < c while d G C and c G R, a. contradiction. Hence there are 
no minimal counterexamples and no counterexamples at all. Thus 0 holds. It follows that 
M represents a feasible solution. This concludes the proof of Lemma ◄ 

Let us complete the proof of Theorem]^ Apply the first part of Lemmato the weighted 
set covering formulation of the problem and take the solution yi j = M^ j, 0 < i < j < n, 
as described above. This solution has value s{n) by Lemma and is feasible by Lemma 
Hence, all fractional coverings have cost at least s{n). 


Upper bound for Kneser-Sierpinski matrices 

Suppose n = 2^. A Kneser-Sierpinski matrix (or a disjointness matrix) of size 2^ x 2^ is the 
matrix defined as follows. Rows and columns of the matrix are indexed from 0 to 2^ — 1. 
The matrix has a 1 at all positions {i,j) such that i and j have no common 1 in their binary 
expansion; all other elements of the matrix are 0 . 

Note that if we identify each number from {0,..., n — 1} with a subset of {1,..., fc} in 
the natural way, then is naturally associated with a Boolean function that maps a pair 
of subsets of {1 ,..., fc} to 1 if they are disjoint, and to 0 if they have an element in common. 
An alternative way to define Z3„ is by a recurrence Z? 2 n = {dZ^o') for n > 1; Di = (1); 
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here subsets of {1,..., fc} are ordered lexicographically. Using the antilexicographic order for 
rows and the lexicographic order for columns would lead to a lower triangular matrix. 

What is the size of smallest depth-2 rectifier networks that express Kneser-Sierpihski 
matrices? Jukna and Sergeev na Lemma 4.2] prove that 

Vpolylog(n) < OR 2 (i:'„) < . polylog(n), ( 2 ) 


and in this section, we prove the following result: 

► Theorem 8. OR 2 (I?n) < ■ polylog(n). 

Note that ^ log5 Ri 1.16096, log(9/4) « 1.16993, and log(l -b ^2) « 1.27. 

Suppose n = 2^ as above, and let be the submatrix of Zl„ whose rows and columns 
correspond to x-sized and j/-sized subsets of {1,..., fc}, respectively. This matrix has 
size (^) X (^). If a: = y, then is the adjacency matrix of the Kneser graph [25] . 

For 0 < y < X < fc, write z = {k - x - y)/2 and /(x,y) = 

Sergeev m Lemma 4.2] show that all coverings of 7?]^’]^ have cost at least /(x, x)/poly(fc), 
and this gives the lower bound in equation Q: taking x = 0.4fc brings /(x, x) to its maximum 
of n 3 *°g^, if we disregard factors polylogarithmic in n = 2^. Our Theoremfollows from 
Lemmas [9] and [TT] below. 


► Lemma 9. There exists a eovering with east at most f{x,y) •poly(fc). 

Proof. Consider iF, the family of all ordered bipartitions of {1,..., fc} into sets of size x + z 
and y + z, where z = (k — x — y) /2. Technically, an ordered bipartition is simply a subset 
of {1 ,..., fc}, but it is more instructive to view it as an ordered pair: this subset and its 
complement. Every such bipartition, (S^S), corresponds to a (maximal) rectangle in 
elements of covered by the rectangle are pairs (X, Y) of disjoint sets that respect the 
bipartition: X C S and Y C S. 

Use the greedy covering lemma (Lemma for the unweighted set covering formulation 
with F. There are bipartitions in this family, and every pair of disjoint sets (X, Y) of 

size X and y respects of them, so 7 = and any greedy covering will contain 

at most N sets, where 

X = . (1 + ln(4'=)) + 1 = ^ • poly(fc). 


For every bipartition in the covering, the corresponding 1-rectangle in will include 
rows and columns; its cost will be at most 2 as y < x. So the total cost of the 

covering will not exceed 


(^+^) . 2N 




(.,..fc-x-J'P°iy(fc) 


f{x,y) ■ poly(fc). 


•4 


► Corollary 10. Suppose 0 < m < kj^ and let 77™ = 77^]™ be the adjacency matrix of 
the (bipartite) Kneser graph: vertices in each part are size-m subsets of {l,...,fc}, and 
two vertices from different parts are adjacent if and only if the subsets are disjoint. Then 
d{m, k)/ poly(A:) < OR 2 ( 77 ™) < d(m, k) ■ poly(fc) where d{m, k) = {^^kl 2 -m,kl 2 )I ■ 


fc! 


We use the standard notation for multinomial coefficients: ( (■ ) 

\a,b,cj 


a! 6! c! 


provided that a -|- b -|- c = A:. 
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► Lemma 11. If 0 < y < x < k, then f{x,y) < • poly(fc), and there exists a pair 

{x*,y*) such that f{x*,y*) > 


Proof. As above, let 2z = k — {x + y). Denote a = z/k and recall that the values of the 
binomial coefficients may be estimated with the help of the binary entropy function (not to 
be confused with s{n) from Section I also known under this name): ~ 

as fc —>■ oo, where i?(A) = —A log A — (1 — A) log(l — A). This formula follows from Stirling’s 
approximation for the factorial [71 Chapter 9 and Solution to Exercise 9.42]. Now 


f{x,y) 


(?) - (?) 


2^kH{oc)2^{l — cx)kH{l/2) 
- 22ofcg(l/2) - 


2(^(“)+i-3“)'=.poly(fc) 


as H{l/2) = 1. Simple calculations show that for 0 < a < 1/2 the inequality H{a) + l — 3a < 
H{l/9) + 1 — 3 • 1/9 = log(9/4) holds. This corresponds to x = 4/9 • k and y = 3/9 • k. ◄ 


To complete the proof of Theorem it remains to note that a union of coverings 
of matrices for all pairs x,y with 0 < x,y < k constitutes a covering of For 

0 y X < k, the coverings are constructed by Lemma and for x < y the construction 
just swaps the roles of x and y. Since there are only (fc + 1)^ = polylog(n) pairs x, y in total, 
the desired follows from Lemma HU 

► Remark 12. Although Theorem [^leaves a gap between the bounds on 0 R 2 (D„), the greedy 
strategy is, in fact, optimal: For each it suffices to use bipartitions into sets of size 
I and k — for some £ = £{k;x,y). (See Appendix for more details.) Our choice of £ in 
Lemmaj^is £ = x + {k — x — y)/2, and the optimal choice, £ = £*{k; x, y), will deliver a tight 
upper bound on 0 R 2 (D„). Numerical experiments seem to indicate that the actual value of 
0 R 2 (D„) is within a polylog(n) factor from na but no formal proof is known to us. 


Lower bound for Kronecker products 


Given two matrices K G {0,1}™^^"'! and M G {0,1}"*^’^"^, their Kronecker (or tensor) 
product is the Boolean matrix K 0 M oi size (mi • m 2 ) x (ni • 72 . 2 ) defined as follows. Its 
rows are indexed by pairs (* 1 , 22 ) and its columns by pairs (ji,j 2 ) where 1 < fs < rus and 
1 < js < Us for s = 1,2. The entry oi K ® M at position ((ii, *2), (ji, J2)) is defined as 

' ^i2,32- 

In this section we prove a lower bound on the OR(-)-measure of Kronecker products. 
Recall that the Boolean rank rkv(A) is the optimal value of the unweighted set covering 


formulation (as in Figure 2a I where the set of 1-entries in the matrix K is covered by all-1 
rectangles. In the linear relaxation of this problem (as in Figure [2b| , the goal is to assign 
weights w{R) G [0,1] to each 1-rectangle R such that w(R) > 1 for each 1-entry (*, j) 

of K, minimizing ^ w{R). Let the fractional rank rk* (A) be the optimal value of this linear 
relaxation. The integrality gap result for the set cover problem [23] and the duality theorem 
imply that rkv(A)/(l -|- log mini) < rk* (A) < rkv(A). In the graph-theoretic language, the 
number rk/,(A) is the fractional biclique cover number, denoted by bc*{G) where A is the 
adjacency matrix of the (bipartite) graph G. Fractional rank is known to be bounded from 
below by the fooling set number, see Watts jUJ Theorem 2.2]. 


► Theorem 13. For any pair K, M of Boolean matrices, OR(A 0 M) > rk*(A) • OR(M). 

Proof. First consider the unweighted set covering formulation for A, where the optimal 
value is rkv(A) as discussed in Section]^ and take its linear relaxation, with the optimal 
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value rk*{K). By Lemma there is an assignment of weights to 1-elements of this matrix, 
w{i,j) G [0,1] for all {i,j) with Kij = 1, such that the following two conditions are satisfied 
(see Figure 2c|. First, for each 1-rectangle Rx C oi K, the sum j)eRxC 
most 1. Second, E(ij):ic,,,=i ''"(l j) = rk*(iL). 

Now let Af = (F, if, in, out) be a rectifier network of size OR(if 0 M) that expresses 
Q = K ® M, where K and M have size as above. For an edge e G E, let To(e) C 
{1,... ,TOi} X {1,... ,TO 2 } be the set of row indices ( 11 ,^ 2 ) of Q such that the node out(ii,i 2 ) 
is reachable from the target of e. Similarly, let From(e) C {1,..., ui} x {1,..., 712 } be the set 
of column indices (ji, jd) of Q such that the source of e is reachable from in((ji, j 2 ))- Then 
R{e) = (To(e), From(e)) is a rectangle of Q. Moreover, define 7rs((zi,i2), (ji, jd)) = {is,js) 
for s = 1,2 and Trs{R) = {7rs(r, c): (r, c) G ii}. Then 7ri(ii(e)) and TT 2 {R{e)) are rectangles 
in K and M respectively. 

We assign real weights based on w to each edge e of Af by the following rule: 
(hi)67ri(JJ(e)) 


Since 7 ri(i?(e)) is a rectangle of K, one of the constraints on w ensures that w'{e) < 1 for each 
edge e of Af. Consequently, — \^\ ~ OR(A' 0 M); furthermore, the following 

chain of inequalities holds: 

OR{K^M) > Y,w'{e) = Y. Y. 

e^E e^E 

= Y wihji) ■ \{e G E : {iiji) G Tri{R{e))}\ 

= Y ■ \{e G E : ii G ni{To{e)),ji G 7ri{From{e))}\. (3) 

Fix an arbitrary entry (zi, ji) of K with = 1. Consider the subgraph Afj^..^i^ of N 

induced by the nodes that are reachable from some source of the form in(ji,j 2 ) and from 
which a node of the form out(zi,Z 2 ) is reachable—in other words, take all nodes and edges 
on all paths from in(ji,j 2 ) to out(zi,Z 2 ) for some Z 2 , jd- Then, since = 1, the node 

out(zi,Z 2 ) is reachable from in(ji,j 2 ) in if and only if = 1. So the network 

A/’jj.wii expresses M (with the mappings in'(j 2 ) = in(ji, jd) and out'(z 2 ) = out(zi,Z 2 )). Hence, 
the number of edges in is at least OR(M). But by our definitions, the relations 

zi G 7 ri(To(e)) and ji G 7 ri(From(e)) hold together exactly for the edges e of Af present in 
Afjj.wii. Thus |{e G if : Zi G 7 ri(To(e)), ji G 7 ri(From(e))}| > OR(M) and we conclude from 
equation © that 

OR{K®M)> Y w(H,ji) •OR(M) = rk*(A:) •OR(M). ◄ 

(il 


► Remark 14. Let SUM (AT) be the smallest size of an unambiguous rectifier network that 
expresses K. A rectifier network is unambiguous if for all i, j it has at most one path from 
in(j) to out(z). Such networks are also known under the names of SUM-circuits |T5] and 
cancellation-free circuits [ 5 ]. The same construction as above also proves the inequality 
SUM(A:® M) > rk*(A:) • SUM(M). 

► Corollary 15. For any pair of matrices K G {0, and M G {0,1}"*^^"^, and 

L G {OR, SUM} it holds that L(Ar 0 M) > rk^y{K) ■ L(M )/{! + log mini). 
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A I Depth-3 lower bound in Example 

Consider the matrix M„ = ^ q for some n > 1 where Jn is the nxn all-one matrix. 

Known bounds give OR(M„) > 4n -I- 1 and this bound is indeed attainable. For OR 3 (M„), 
i.e. realization by some rectifier network of exact depth 3 we show OR 3 (M„) = 4n -I- 3 using 
the following lemma: 

► Lemma 16. Suppose M is a Boolean matrix and Af = (V, i?, in, out) is a reetifier network 
realizing M of some depth d. Then there exists a rectifier network AT = (V, FI', in, out) with 
< jA/"! having depth at most d satisfying the following conditions: 

i) whenever the iith and the i 2 th row are the same in M, then the sets {v : (r;,out(ii)) € 
E’} and {v €V \ (u,out(z 2 )) G E'} coincide; 

ii) dually, whenever the jith and the j 2 th column of M are the same, then {v € V : 
(in(ji),'i') & E'} = {v G V : {\n{j 2 ),v) G E'}. 

Proof. Let v = in(j) be a source node and let Xj stand for the set {w G V : {v,w) G E} of 
its neighbours. Since Af realizes M, the set of target nodes out(z) which are reachable in Af 
is exactly the image under out of those indices i for which Mij = 1. Now for each column 
index j let j' be the index for which the jth and the j'th column of M is the same, jXji | is 
the smallest possible among these sets and j' is the smallest among these indices. Note that 
j' is always well-defined and whenever the jith and the j 2 th column coincide, then 

Then, define A/q as(K, ifo, in,out) with Fig = FI—{(in(j),'(;)}U{(in(j),u) : v G Xji}. (That 
is, we reattach the edges coming out from sources to the neighbours of the representative 
source of their equivalence class.) 

Then by the choice of the values f (in particular, with \Xj^\ having been minimized) 
we have that i) is satisfied, Ag also realizes M, the depth is not increased (if Af is strictly 
levelled) and lAf'l < \Af\. Applying the analogous transformation to the targets we get a 
network Af' satisfying ii) as well. ◄ 


Thus we get that there exists a depth-3 network of minimal size realizing Mn such that 
H each source in(z) for z = 1,..., n have the same set Xi of neighbours; 

H each source in(z) for i = n + 1,, 2n have the same set X 2 of neighbours; 

B each target out(j) for j = 1,... ,n have the same set Yi of neighbours and 
B each target out(j) for j = n + 1,... ,2n have the same set Y 2 of neighbours 
since the corresponding rows and columns coincide. In this network there are n(|Ai| -|- IA 2 I + 
I hi I -I- |h 2 |) edges in total between the outermost layers (and some additional edges between 
the two middle layers. Clearly none of these sets can be empty (since all the rows and 
columns are nonzero), and if any of them is a non-singleton set, the size of the network is 
at least 5zz > 4zz -I- 3. So in order to go below 5n, Xi = {a;i}, X 2 = {X 2 } etc. have to be 
singleton sets. Now since not all rows (columns, resp.) are equal, Xi X 2 and z/i yf z /2 has to 
hold, and there is only one choice (because the sets are singletons) to wire the two middle 
layers together, namely adding the edges {xi,yi), {xi,y 2 ) and {x 2 ,y 2 ), giving 4zz-|- 3 edges 
in total as optimal value for depth d = 3. 

Note that if the network is not required to be strictly levelled, we can merge xi with yi 
and X 2 with z/i and add only the edge (xi,X 2 ) reaching the optimal bound 4n -|- 1. 
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I B I Upper bound in Corollary 

Recall that a SUM-czrctti< for a matrix M is the same as an unambiguous rectifier network: 
it is a rectifier network that has at most one path between any input—output pair. The 
smallest size of an unambiguous rectifier network that expresses M is denoted by SUM(M); 
similarly, SUM 2 (M) is the smallest size of an unambiguous rectifier network of depth 2 that 
expresses M. In the same way as rectifier networks of depth 2 correspond to rectangle 
coverings, unambiguous rectifier networks of depth 2 correspond to rectangle partitions (that 
is, coverings with no overlap between rectangles). If one views the matrices as adjacency 
matrices of bipartite graphs, then the measures 0 R 2 (-) and SUM 2 (-) correspond to minimal 
biclique coverings and minimal biclique partitions, respectively. Clearly, OR(M) < SUM(M) 
and ORd(M) < SUMd(M) for each depth d. Also, if M = then SUM(M) < 

EtiSUM(M,). 

We show below that SUM 2 (T„) < s(n) = n([log 2 nJ + 2) — Theorem 0 will 

then imply that 0 R 2 (Tji) = SUM 2 (Tji) = s(n). 

First, let Jn be the nxn all-1 matrix and Jm,k the mxk all-1 matrix. Clearly, SUM 2 ( 
is m + k. Second, observe that T 2 n = {^o tZ) T 2 n+i = follows that 

SUM 2 (T 2 „) < 2SUM2(T„)-t2nandSUM2(r2„+i) < SUM 2 (r„)-kSUM 2 (r„+i)-t 2n-t 1. This 
shows, by induction, that SUM 2 (rji) < s(n), since the induction basis is easily checked. 

I C I Optimality of the greedy strategy for Kneser-Sierpinski matrices 

Although Theoremleaves a gap between the bounds of and on OR2(7In)j 

the greedy strategy is, in fact, optimal. We first give a brief sketch of the argument, and 
then fill in all the details below. 

Consider the linear relaxation of the set covering formulation for each Note that 

only maximal rectangles (i.e., those associated with bipartitions) can participate in optimal 
fractional coverings. In fact, for any i € [x,k — y] there exists a fractional covering r](i) of 
^[k] which uses only bipartitions into sets of size i and k — i and for which all “covering” 
constraints in the LP are tight; it suffices to pick a single i since this fractional covering r]{£) 
uses all such bipartitions with multiplicity Hence, the problem reduces to an 

unweighted set covering formulation, where the greedy heuristic achieves a value within a 
factor of 1 -k log (*) (^) < 1 -k 2fc = poly log (n) of the optimum. 

In more detail, first consider an arbitrary weighted set cover problem: let Si, ^ Sk C U 
be the sets, with Wi > 0 being the cost of Si. Let p = min{|^ : i = 1,..., fc} be the best 
cost/utility ratio offered by the sets. Then, in the dual formulation of its LP relaxation, if one 
assigns uniformly p to each element u G U of the universe, then each set Si gets p • \Si\ < Wi 
total charge, hence this uniform distribution is a solution to the dual, hence p ■ \U\ is a, lower 
bound for the optimum of the primal problem by the weak duality theorem. 

For the case of the weighted covering by rectangles, a rectangle of size k x m has cost 
k + m and covers km elements, hence its offered ratio is = S + mi i-®- decreases 
strictly by increasing either k or to, thus the best ratios are always offered by maximal 
rectangles. 

Now considering a rectangle i? in a matrix formed by the rows Xi,X 2 ,... ,X]^ 

and columns Yi,..., we have by definition that each Xi is disjoint from each Yj, thus 

k 

choosing S = |J we have that i? is a subrectangle of the rectangle corresponding to the 

i=l 

bipartition {S, S), yielding that only rectangles corresponding to bipartitions can be maximal. 
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On the other hand, any such rectangle is clearly maximal. Denoting jS”! by t' we get that the 
ratio offered by these rectangles is /i(fc, x,y,t) = -^ + Then setting £* = £*{k, x, y) = 

arg min^ x,y,(.):x<^,y<k — f\\s the parameter of those rectangles offering the best 

. ii _ (x)+('° 


possible ratio y* = y{k,x,y,£*) for Thus, y* 

lower bound for the cost of the optimal solution. 


^[k] II - 


k-i* 

y 




do{V))»“ 


Observe that this bound is indeed attainable by the greedy strategy, since each set (X, Y) 
with |X| = X and \Y\ = y, X = 0 is covered exactly by such rectangles (i.e. 

respects this number of such bipartitions), thus considering the fractional covering 
which uses all such bipartitions with multiplicity l/(^ we get a covering of with 


total cost 




(( 1 ) 


/k-t 

V 


))(^.) (that is, multiplicity X weight of a rectanglexnumber 


of these rectangles). 


The last expression is the same as 


am 


(©(V))’ 


since 


(x) ~ ie*) ^x)^~y )'■ both of these products calculate the number of possibil¬ 

ities to choose an ^*-element subset L of a A:-element set K, and an x-element subset X of L 
as well as an y-element subset oi K — L. The first formula achieves this by choosing X from 
K first, then Y from K — X, finally L — X from K — X — Y, the second one by choosing L 
from K first, then X from L and finally Y from K — L. Thus, choosing all these bipartitions 
with this multiplicity provides an optimal solution. 


Note that for any fixed £, the weighted set covering problem using only the bipartitions 
(5', S) with I S'! = ^ is a uniform-cost, i.e., an unweighted set covering problem. On such a 
problem the greedy heuristic achieves a value within a factor of 1 + log (^) (^) < 1 -|- 2fc = 
polylog(n) of the optimum in the linear relaxation. Therefore, it suffices to pick some £ and 
construct a greedy covering using bipartitions into sets of size £ and k — £. Our choice of £ in 
Lemmais £ = x+{k — x — y)/2, and the argument above shows that the optimal choice, 
£ = £*{k;x,y) will deliver an upper bound on OR 2 (D„) that is tight up to a polylogarithmic 
factor, thus reducing the problem to a parametric optimization task. 


D 


Application: size of regular expressions 


A regular expression over E is a well-formed expression r consisting of the symbols 


e,0, (, ),+,*, and a S S, 


with the usual semantics (e.g., as in my 

The size of a regular expression r can be specified in a number of different ways, but for 
our purposes, the easiest is the so-called alphabetic length, which is the number of symbols in 
r belonging to s m- For example, the alphabetic length of 

r = ooai-I -0203 -I-(ao-I-ai)(a 2 -I- 03 ) (4) 


is 8 . 

Given a regular language L specified in some way (for example, as the language accepted 
by a finite automaton), it is, in general, quite difficult to determine the size of the shortest 
regular expression specifying L. In fact, this problem is PSPACE-hard [1H1II3] and not even 
approximable within a factor of o{n) [5] (unless P = PSPACE). 
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Extended example 

In this subsection we examine a specific family of finite languages, namely 



0<i<j<n 

over the alphabet = {oq, oi,..., a„_i} of size n, and we provide matching upper and 
lower bounds on for the size of the shortest regular expression for it. For example, for n = 4 
this is the language 

L4 = {Ootti, 0002,0003, 0102,0103, 0203}. 

Evidently one can produce a regular expression for of length n(n — 1) by listing the 
elements of Ln, but it is possible to do much better. For example, the regular expression given 
in Q specifies L 4 with alphabetic length 8 , as opposed to length 12 using the brute-force 
approach. 

Our upper and lower bounds follow Corollary in the main text. For the lower bound, 
we relate the alphabetic length of regular expressions to the cost of coverings of Boolean 
matrices; for the upper bound, we provide a direct proof to make the connection between 
regular expressions and coverings more transparent. 

We first show how to construct a small regular expression for L„ through a simple divide- 
and-conquer strategy. We generalize to La,b = Uyi<i< 4 <B ~ Lo^n-i- 

Then our divide-and-conquer solution is given by 

La,b = La,c U Lc+i,b U {ua + o-A+i -l- ■ ■ • -l- oc} • {ac+i + • • ■ + o-b}, 

where C = [(A-f B)/2\. The alphabetic length t{n) of the regular expression so constructed 
satisfies the recurrence t(l) = 0 and t{2n) = 2t{n) + 2n and t{2n+l) = t(n-|-l)-|-t(n)-|- 2 n-|-l. 
Now an easy induction proves that in fact t{n) = s(n), with s(n) = n( [log 2 nj -|- 2 ) — 2 L*°S 2 

We now turn to the lower bound. Let be a regular expression of shortest length for 
for n > 2. Clearly we can assume that contains no occurrence of the empty set symbol 0. 
Since Ln is finite, we can also assume Vn contains no occurrence of *. So all the operators 
in r„ are either union or concatenation. Consider any instance of concatenation, say LiL 2 - 
Then if either Li or L 2 contains strings of two different lengths, the resulting concatenation 
would also, which is impossible since contains only strings of length 2. So all strings on 
one side of any concatenation are of the same length. On the other hand, no strings can be 
of length 3 or more, and if one side contains only strings of length 0 (the empty string) we 
could simply omit the concatenation. So in fact we may assume, without loss of generality 
that any concatenation in looks like R ■ C, where both languages consist of subsets of 
Finally, every letter in C must be numbered higher than all those of R, for otherwise we 
would obtain a word not in L„. This means that we can write r„ as 


i?i ■ Cl -l- i?2 • C 2 + ■ ■ ■ + Rt ■ Ct (5) 

where we have inserted dots to make the concatenation explicit. The alphabetic length of 
this expression is 

We now create an integer program to minimize this length. Define = {0,l,...,n — 1} 
and let xr^c for nonempty sets R,C C be an indicator variable for the presence of the 
term i? • C in the expression (|^: 1 if it is present and 0 otherwise. Our integer program is 
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minimize ^ R,C nonempty m + \c\)xR,c 

max C 

subject to the constraints 

xr,c G {0,1} for nonempty i?, C C X„ and maxi? < minC 
Y' iGR xr c ^ 1 for nonempty R.C C and maxi? < min C . 

j€C 


The last constraint means that every string aiaj with i < j is covered by at least one 
concatenation of sets. Note that we write “> 1” in the last group of inequalities instead of 
“= 1”, because we are not insisting that our regular expression be unambiguous. 

For example, if n = 3 then the integer program is 

minimize 2a:o,i + 2 a;o ,2 + 2 a;p 2 + 3x01,2 + 3xo,i2 

subject to the constraints 

2^0,1, a;o,2, a:i,2,3^01,2, 2^01,2 G {0; 1} 

2^0,1 + a:o,i2 > 1 
2^0,2 + a:oi,2 + a:o,i2 > 1 
2^1,2 + a;oi,2 > 1- 


It is not difficult to see that our integer program, in fact, is the weighted set covering 
formulation, from Section]^ where the optimal value is OR 2 (T„) with T„ the n x n full 
triangular matrix, as in Section]^ So we can conclude from Corollarythat the smallest 
alphabetic length of a regular expression for the language is s(n) = n([log 2 nj + 2) — 

2 [logs nj+l^ 


In what follows, we illustrate the approach taken in the main text by formulating the 
linear relaxation of the integer program above and taking its dual. This follows Figure in 
Section ID 

The integer program above is an instantiation of the one in Figure]^ We now relax the 
constraints on the xr^c to be 0 < xr^c ^ 1- Th® dual linear program then has variables yij 
corresponding to the string UiUj, for 0 < i < j < n; compare to Figure [2b| The corresponding 
dual, as in Figure [2^ is 

maximize J2o<i<j<n 
subject to the constraints 
yi,j > 0 for 0 < f < j < n 

^ ien yi^j < |i?| + \C\ for nonempty i?, C C and maxi? < minC. 


For example, for n = 3 the corresponding dual is 


maximize 

2/0,1 + 2/0.2 + 2/1,2 

subject to 

the constraints 

2/0,1 

> 0 


2/0,2 

> 0 


2/1.2 

> 0 


2/0,1 

< 2 


2/0,2 

< 2 


2/1,2 

< 2 


2/0,1 

+ 2/0,2 

< 3 

2/0,2 

+ 2/1,2 

< 3. 
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General connection 

Whenever L C EA for the alphabets S = m} and A = n}, and Ml is its 

characteristic m x n matrix Mij = 1 iff ij S L, then the following statements hold: 

1. The value OR 2 (Ml) coincides with the smallest possible alphabetic length of a regular 
expression for L. 

2. The value OR 2 (Mi) also coincides with the size of the smallest e-free nondeterministic 
finite automaton (NFA) recognizing L. 

3. The value OR(ML) + TO + n is an upper bound on the size of the smallest nondeterministic 
finite automaton with possible e-transitions (e-NFA) recognizing L. 

The proof of the first statement follows the example above, and the last two statements 
can be found in m- 


